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We investigate a state discrimination problem in operationally the most general framework to 
use a probability, including both classical, quantum theories, and more. In this wide framework, 
introducing closely related family of ensembles (which we call a Helstrom family of ensembles) with 
the problem, we provide a geometrical method to find an optimal measurement for state discrimi- 
nation by means of Bayesian strategy. We illustrate our method in 2-level quantum systems and in 
a probabilistic model with square-state space to reproduce e.g., the optimal success probabilities for 
binary state discrimination and N numbers of symmetric quantum states. The existences of families 
of ensembles in binary cases are shown both in classical and quantum theories in any generic cases. 



I. INTRODUCTION 

Among many attempts to understand quantum theory 
axiomatically, an operationally natural approach has at- 
tracted increasing attention in the recent development of 
quantum information theory p], 0, H, 13, [H • By construct- 
ing a general framework of theories to include not only 
classical and quantum theories but also more general the- 
ories, one can reconsider the nature of quantum theory 
from outside, preferably with the operational and infor- 
mational point of view. This also enables us to prepare 
for a (possible) post-quantum theory in the future. For 
instance, it is important to find conditions to achieve a se- 
cure key distribution in a general framework [6(1. Among 
others, the convexity or operational approach Q, or re- 
cently referred as "general (or generic) probabilistic the- 
ories (or models)" @, ©], is considered to provide op- 
erationally the most general theory for probability. Of 
course, both classical probability theory and quantum 
theory are included as typical examples of general prob- 
abilistic theories, but it is known that there exist other 
possible physical models for probability (See an example 
in Sec. IV B). 

Although this approach has relatively long history 
[lol . [TlT | , there are still many fundamental problems espe- 
cially from the applicational and informational points of 
view to be left open. This may not be surprising if one 
recalls that quantum information theory has given new 
insights and provided attractive problems on the founda- 
tion and application of quantum mechanics. One of them 
is a state discrimination problem. The problem asks how 
well a given ensemble of states is distinguishable. It has 
been one of the most important questions in quantum 
information theory, and there are various formulations 
of the problem depending on measures to characterize 
the quality of discrimination [12, EH HH EH ■ The prop- 
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erty that there is no measurement perfectly distinguishes 
non-orthogonal pure states plays an essential role in the 
various protocols such as quantum key distribution [l6l |. 
and is often considered as the most remarkable feature 
of quantum theory. On the other hand, in the context of 
general probabilistic theories, the property can charac- 
terize the nature of classical theory. Indeed, it is known 
that a general probabilistic theory is a classical theory if 
and only if all the pure states can be perfectly discrimi- 
nated in a single measurement 0]. 

In this paper, we discuss an optimal state discrimina- 
tion problem in general probabilistic theories by means of 
Bayesian strategy. While the existence of Bayes optimal 
measurements has been discussed in general setting fl7j , 
we provide a geometrical method to find such optimal 
measurement and optimal success probability. Our fig- 
ure of merit is the optimal success probability, in discrim- 
inating N numbers of states under a given prior distribu- 
tion. We introduce a useful family of ensembles, which 
we call a Helstrom family of ensembles, in any general 
probabilistic theories, which generalizes a family of en- 
sembles used in [l8| in 2-level quantum systems for binary 
state discrimination, and show that the family enable us 
to obtain optimal measurements by means of Bayesian 
strategy. This method reveals that a certain geometrical 
relation between state space and the convex subset gen- 
erated by states which we want to distinguish is crucial 
for the problem of state discrimination: In the case of 
uniform prior distribution, what one has to do is to find 
as large convex subset (composed of Helstrom family of 
ensembles) as possible in state space which is reverse ho- 
mothetic to the convex subset generated by states under 
consideration. The existences of the Helstrom families for 
N = 2 which again have a simple geometrical interpre- 
tation arc shown in both classical and quantum systems 
in generic cases. Some other works on the problem in 
quantum theory are related with our purpose; The no- 
signaling condition was used in deriving the optimal suc- 
cess probability between two states in 2-level quan- 
tum systems, a bound of the optimal success probability 
[l9| and a maximal confidence (2(j| among several non- 



Typeset by REVTeX 



2 



orthogonal states in general quantum systems. In partic- 
ular, we discuss the relation between our method and the 
one used in |l8l| . and show that our method generalizes 
the results in [1 81 ] to general probabilistic theories. 

The paper is organized as follows. In Sec. [Ill we gi ye a 
brief review of general probabilistic theories. In Sec. IIII1 
we introduce a Helstrom family of ensembles and show 
the relation with an optimal measurement in state dis- 
crimination problem (Propositions [TJ [21 Theorem Q]). We 
also prove the existences of the families of ensembles for 
N = 2 in classical and quantum systems in generic cases 
(Theorems [21 [3]). In Sec. IIV1 we illustrate our method 
in 2-level quantum systems, and reproduce the optimal 
success probabilities for binary state discrimination and 
N numbers of symmetric quantum states. As an example 
of neither classical nor quantum theories, we introduce a 
general probabilistic model with square-state space. Our 
method is also applied to this model to exemplify its us- 
ability. In Sec. [V] we summarize our results. 



II. BRIEF REVIEW OF GENERAL 
PROBABILISTIC THEORIES 

In order to overview general probabilistic theories as 
the operationally most general theories of probability, let 
us start from a very primitive consideration of physical 
theories where a probability plays a fundamental role. In 
such a theory, a particular rule (like Borcl rule in Quan- 
tum mechanics) to obtain a probability for some output 
when measuring an observable o under a state s should be 
provided. Therefore, states and observables are two fun- 
damental ingredients with an appropriate physical law 
to obtain probabilities in general probabilistic theories. 
Let us denote the set of states by S. In a simplified 
view, an A-valued observable o [2y| can be considered 
as an TV numbers of maps oi on a state space S so that 
Oi(s) £ [0, 1] provides a probability to obtain ith output 
when measuring this observable under a state s£5- It is 
operationally natural to assume that if one can prepare 
states s £ S and t £ S, then there exists a probabilistic- 
mixture state < A, s, t >£ S for any A £ [0, 1] which 
represents an ensemble of preparing state s with prob- 
ability A and state t with probability 1 — A. Further- 
more, it is natural to assume the so-called separating 
condition for states; namely, two states s\ and S2 should 
be identified when there are no observables to statisti- 
cally distinguish them. Then, it has been shown 0, H3 
that without loss of generality, the state space S is em- 
bedded into a convex (sub)set in a real vector space V 
such that a probabilistic-mixture state is given by a con- 
vex combination < A, s, t >= As + (1 — X)t j27[. Hence, 
hereafter the state space S is assumed to be convex set 
in a real vector space V with the above mentioned in- 
terpretation. An extreme point [28| of a state space S 
is called a pure state, otherwise a mixed state. Physi- 
cally, a pure state is a state which cannot be prepared as 
an ensembles of different states. From the preparational 



point of view for state < A, s, t >= As + (1 — A)t, each 
maps Oi of an observable o should be an affinc functional: 
Oi(Xs + (1 — X)t) = Xoi(s) + (1 — X)oi(t), since the right 
hand side is a sum of probabilities to obtain ith outputs 
for exclusive events of states s and t with probability A 
and 1 — A, while Oj(s),Oj(£) are conditional probabilities 
to obtain ith. output conditioned that the states are s and 
t, respectively. An effect e on S is an affine functional 
from S to [0,1]. There are two trivial effects, unit effect 
u and zero effect 0, defined by u(s) = l,0(s) = for all 
s £ S- With this language, an A''- valued observable o is 
a set of effects o, (i = 1, . . . , N) satisfying J^i=i °» = u i 
meaning that Oj(s) is the probability to obtain the ith 
output when measuring the observable o in the state s. 
We denote by £ and On the sets of all the effects and 
A-valued observables, respectively. While the output of 
an observable can be not only from real numbers but 
also any symbols, like "head" or "tail" , hereafter we of- 
ten identify them with {1, . . . , N}. Physically natural 
topology on S is given by the (weakest) topology so that 
all the effects are continuous. Without loss of generality 
[22] ]. S is assumed to be compact with respect to this 
topology. Typical examples of the general probabilistic 
theories will be classical and quantum systems. For sim- 
plicity, the classical and quantum systems we consider in 
this paper will be finite systems: 

[Example 1: Classical Systems] Finite classical sys- 
tem is described by a finite probability theory Let 
Q = {u>i, . . . ,L0d} be a finite sample space. A state is 
a probability distribution p — (pi , . . . , Pd) , meaning that 
the probability to observe uii is pi. Therefore, the state 

space is Sd = {p = (Pi, • ■ • ,Pd) G K d I Pi J2iPi = 

1} C M d , and forms a (standard) simplex [2!|. The set 

of extreme points is {p^}f =1 where p^ = Sij. An effect 

e is given by a random variable / : f2 — > [0, 1] such that 

e(p)=Eif("i)Pi (0 </(»)<!)■ 

[Example 2: Quantum Systems] d- level quantum sys- 
tem is described by an d dimensional Hilbcrt space Tt. 
A state is described by a density operator p, an Hermi- 
tian positive operator on Ji with unit trace, and the state 
space is given by 5 qu = {p £ Ch{H) \ P > 0, tip = 1} C 
Ch(H)', here real vector space £h(H) is the set of all Her- 
mitian operator on 7i- A pure state is a one dimensional 
projection operator onto a unit vector ip £ Ji, written as 
P = r ht)(i } \ m Dirac notation. An effect e is described 
[13, EH by a positive operator B such that < B < I 
through e(p) = tr (Bp), which is called an element of 
positive-operator- valued measure (POVM) [30l |. 

In the following, we assume that all observables {oi}fL 1 
composed of effects 0.; satisfying J^. oi = u are in princi- 
ple measurable. Then, only the structure of state space 
characterizes the general probabilistic theories. Roughly 
speaking, for each (compact) convex set one can consider 
the corresponding general probabilistic model. When we 
consider a composition of state spaces, the so-called no- 
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signaling condition is usually required to keep the causal- 
ity- _ _ 

We refer 0JlJJ for the details of general probabilistic 
theories and [8| where generalized No-broadcasting and 
No-cloning theorems have been shown in general proba- 
bilistic theories. 



III. HELSTROM FAMILY OF ENSEMBLES IN 
GENERAL PROBABILISTIC THEORIES 

As a state discrimination is one of the central problems 
in quantum information theory, we consider a problem 
to discriminate states in general probabilistic theories by 
means of Bayesian strategy. Suppose Alice is given a 
state chosen from {st £ S}fLi with a prior probability 
distribution {p t g R}fLi > = !)■ and her 

goal is to guess the state. She wants to find an opti- 
mal measurement to maximize the success probability. 
Without loss of generality, it is sufficient to consider an 
N- valued observable E = {ei}f =1 £ On from which she 
decides the state was in when obtaining the output i. 
Then, the success probability is 

JV 

P S (E) = £>ei(*). (1) 

i=l 

The optimal success probability P$ is given by optimizing 
-Ps(E) among all the N- valued observablcs: 

P s = sup P S (E). (2) 

For a binary discrimination (TV = 2), it can be written 
as 

Ps =P2 +sup[pie(si) -p 2 e(s 2 )], (3) 

where in the final expression we have used e\ + e 2 = u. 
This problem is well investigated in quantum mechanics, 
and the optimal success probability to discriminate two 
distinct density operators p\ , p 2 with a prior distribution 
P\,p% is given by 

P S Q) =P2+ sup ti[E(pipi - P2P2)] 

0<E<1 

= TjCt + lbiPi — P2P2II1)- (4) 

Here, the norm is a trace norm defined by ||-<4||i := 
tr \ A\ = tr V A^A. Since this bound is sometimes referred 
as the Hclstrom bound, let us call Ps (O also the Hel- 
strom bound for any N and for any general probabilistic 
theories. 

In order to obtain the Helstrom bound in general prob- 
abilistic theories, we shall introduce a family of ensembles 
which is later shown to be closely related to the optimiz- 
ing problem in Eq l[2"]). In the following, we assume that 
a prior probability distribution satisfy pi 7^ 0, 1 removing 
trivial cases: 



Definition 1 Given N distinct states {si G S}iLi an d a 
prior probability distribution {pi}i—i, we call a family of 
N -numbers of ensembles {pi, si; 1 —pi, ti} (i = 1, . . . , N) 
a "weak Helstrom family of ensembles " ( or simply a 
"weak Helstrom family") for states {si} and a probability 
{Pi} if there exist N -numbers of binary probability distri- 
butions {pi, 1 — pi} (0 < pi < 1) and N -numbers of states 
{ti G S}f=i satisfying 



iii) piSi + (1 - pi)U = pjSj + (1 - pj)tj, (6) 
for any i,j = 1, . . . ,7V 

Note that condition © means that N ensembles 
{pi, Si; 1 — pi, ti} are statistically equivalent (among ob- 
servables) . Therefore, a weak Helstrom family is a family 
of statistically equivalent ensembles which are mixtures 
of states {si} and {ti} with weights pi and 1 — pi satisfy- 
ing condition ^ . We call U a conjugate state to s, . The 
probabilistic-mixture state determined by N ensembles 
{pi, Si; 1 — pi,ti} with condition ([6]) is called a reference 
state and is denoted by s: 

s := PiSi + (1 - Pi)U (Vi = 1, . . . , N). (7) 

We call the ratio p < 1 a Helstrom ratio defined by 

p:=^ (Vi = l,...,IV), (8) 
Pi 

which turns out to play an important role in an optimal 
state discrimination. We call a weak Hclstrom family a 
trivial (resp. nontrivial) family when p = 1 (resp. p < 1). 

Note that a weak Helstrom family always exists for any 
distinct states {si} and a prior probability distribution 
{pi}- For instance, it is easy to see that pi = pi (p = 1) 
and ti = 1 (y]jfj PjSj) gives a weak Helstrom family 
of ensembles with a reference state s = ^iPiSi, although 
it is a trivial family. (See later examples for nontrivial 
families.) Moreover, if {pi, s%; 1 — pi, U} (i — 1, . . . , N) is 
a weak Helstrom family with a Helstrom ratio p < 1 and 
a reference state s, then for any p < p 1 < 1, one can con- 
struct another weak Hclstrom family with a Helstrom ra- 
tio p'. Indeed, since < -M^V < 1 for p\ = 4 (< 1), one 

can take conjugate states as := 1 U + (1 — )si. 
Then it is easy to see that the family of {si,p^; t'^l — p^} 
is a weak Helstrom family with a Helstrom ratio p' and 
the same reference state s. 

Let us explain a geometrical meaning of a weak Hcl- 
strom family of ensembles which makes it easier to find 
it. First we explain this for the most interesting cases 
in the context of state discrimination, i.e., those with 
the uniform probability distribution pi = l/N. In these 
cases, condition ([5]) tells that pi should give the same 
weights q :~ pi = and condition © geometrically 
means that all ti should located in S such that all Sj and 
ti have the common interior point s with the same ra- 
tio q. Global picture for this is that one has to find ti 



so that the polytopcs X = conv, = i r .. l jv[ij] as a subset 
of S and Y = conv i=lj ...,jv[sj] posses the internal homo- 
thetic center s in S so that the polytopes X and Y are 
geometrically similar to one another with the similarity 
ratio j^—. Fig. Q] [A] illustrates an example for N = 3 
with the uniform distribution. One immediately recog- 
nizes the reverse homothethy between two polytopes (tri- 
angles) generated by {si} and {ti} with the internal no- 
mothetic center s. As is later shown, it is preferable to 
find a weak Helstrom family with smaller p (and hence 
larger q) as much as possible. Therefore, if one knows 
the global image of state space S, then finding as large 
polygon as possible in S which is reverse homothetic to 
the polygon generated by {si} will provide you a good 
weak Helstrom family. Another simple algorithm to find 
a weak Helstrom family is the following: First choose 
freely a reference state s, and making lines from each Sj 
passing through s to the point in S with which s is the 
interior point with the common ratio q and 1 — q (See 
Fig. [1] [A] for N = 3). Then, with conjugate states as 
end-points of these lines, one obtains a weak Helstrom 
family {q, Sjj 1 — q, ti} with a Helstrom ratio p = ^L.. 

With general prior probability distribution {pi}, an al- 
gorithm to find a (possibly nontrivial) weak Helstrom 
family as small p as possible is as follows: Take a refer- 
ence state in S, e.g., s = ^2;PiSi. Extend a line from 
each Si(i = 1,...,N) passing through s until the line 
reaches the boundary of S- Let m be such states on 
the boundary and let < qi < 1 be the ratio so that 
s = qiSi + (1 — qi)u{. Then, conjugate states U on each 
line satisfying s = piSi + (1 — pAti with pi := p,<? '° where 

io := argmaXj =1 jv[^], gi ye a (nontrivial) weak Hel- 
strom family of ensembles with a Helstrom ratio p = — . 
Notice that for general cases, the similarity between two 
polytopes generated by {si} and {U} is distorted. (See 
Fig. [T] [B] for N = 3.) 

In the following, we show that a weak Helstrom family 
of ensembles is closely related to an optimal state dis- 
crimination strategy, and provide a geometrical method 
to obtain the Helstrom bound Ps and an optimal mea- 
surement in any general probabilistic theories. 

Let us again consider a state discrimination problem 
from {si £ S}fLi with a prior distribution {pi}f =1 - Let 
E = {ei]f =l be any A^-valued observable from which 
Alice decides the state be in Si if she observes an out- 
put i. Suppose that we have a weak Helstrom family 
{pi, s^, 1 — pi, ti} (i = 1, . . . , N) with the reference state 
s = piSi + (1 — Pi)U (i = 1, . . . , N) and a Helstrom ratio 
p = i 1 . Then, using u = J2i e ii affinity of and Eq. (ffj, 
it follows 

1 = u(s) = ^ e i( s ) = X! e i(Pi S i + ( 1 _ 
i i 

= - y^Piej(sj) + y^(l -pi)ei(ti) 

i i 

= -Ps(E)+y2(l-p i )e i (t i ). (9) 
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Pi = P2 = P3 = 1/3 Pt = 1/6. p 2 = 1/3, p 3 = 1/2 



FIG. 1: Let S be a convex set in R 2 as depicted in the figures. 
For three distinct states {si, S2, S3} in S, non-trivial weak Hel- 
strom families are illustrated [A] for the uniform distribution 
and [B] for pi = 1/6, P2 = 1/3, P3 = 1/2, where Helstrom ra- 
tios are [A] p = l/3q = 1/2 (q = pi = 2/3) and [B] p = 2/3. In 
[A], two polytopes (triangles) generated by {si} and {ti} are 
reverse homothetic to one another with the similarity point 
s, while in [B], these polytopes are distorted homothetic de- 
pending on the prior distribution. 

Since — Pi) e (U) ^ 0; we obtain 

Ps(V)<p (10) 

which holds for any observables E. Thus we have proved 
the following proposition. 

Proposition 1 Let {pi, s^, 1 — pi, ti} (i = 1, . . . , N) be a 

weak Helstrom family of ensembles with a Helstrom ratio 
p = j- . Then, we have a bound for the Helstrom bound 

Ps<P- 

This means that, once we find a weak Helstrom family 
of ensembles, a bound of the Helstrom bound is auto- 
matically obtained. A trivial weak Helstrom family gives 
a trivial condition Ps < p = 1, which is the reason we 
called it trivial. Examples of nontrivial weak Helstrom 
families are given in Fig. [T] where [A] Ps < p = 1/2 
and [B] Ps < p = 2/3. Namely, the optimal suc- 
cess probability in this general probabilistic model is at 
most 1/2 and 2/3 for [A] pi = p 2 = p 3 = 1/3 and [B] 
pi = 1/6, p 2 = 1/3, p 3 = 1/2, respectively. 

Moreover, Proposition [1] leads us to a useful notion of 
Helstrom family of ensembles defined as follows: 

Definition 2 Let {pi, Si; 1 — pi, U} (i = 1, . . . , N) be a 

weak Helstrom family of ensembles for N distinct states 
{si} and a prior probability distributions {pi}. We call 
it a Helstrom family of ensembles if the Helstrom ratio 
p = ^ attains the Helstrom bound: Ps = p. 

From equations ([9]), an observable E satisfies Ps(E) = p 
if ei{ti) = for any i = 1, . . . , N. Then, it follows p = 
Ps(E) < Ps < p. Consequently, we have 

Proposition 2 A sufficient condition for a weak Hel- 
strom family of ensembles {pi, sf, 1—pi, U} (i = 1, . . . , A^) 
to be Helstrom family is that there exists an observable 
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FIG. 2: Geometrical appearance of two distinguishable states 

tl,t2. 

E = {ei]f =l satisfying ei(U) = for all i = 1, . . . , N. In 
this case, the observable E gives an optimal measurement 
to discriminate {s,} with a prior distribution {pi}. 

Two states ti,t 2 G S are said to be distinguishable if 
there exists an observable E = {ei,e2} which discrim- 
inates t\ and t 2 with certainty (for any prior distribu- 
tions), or equivalently satisfy 

ei(ai) = l,ei(<7 2 ) = e 2 (ai) = 0, 62(02) = 1). 

(11) 

Therefore, as a corollary of Proposition [2] for N = 2, we 
obtained the following theorem for a binary state dis- 
crimination (N = 2). 

Theorem 1 Let {pi,Si\l — pi,U} (i = 1,2) be a weak 
Helstrom family of ensembles for states si,S2 € 5 and a 
binary probability distribution pi,p2 such that t\ and t^ 
are distinguishable states. Then, {pi, sf, 1 — pi, ti} (i = 
1,2) is a Helstrom family with the Helstrom ratio p = Ps- 
An optimal measurement to distinguish si and S2 is given 
by an observable to distinguish t\ and £2- 

Proof The distinguishability of t\ and t2 satisfies the 
sufficient condition in Proposition O I 
Let us consider the case where S is a subset of fi- 
nite dimensional real vector space V. From condition 
(fTTT) . geometrical meaning of two distinguishable states 
ti,ta is that they are on the boundary of S which pos- 
sess parallel supporting hypcrplanes (See Fig. [2]). Here, 
a supporting hyperplane at a point s G S is a hyper- 
plane H C V such that s G H and S is contained in one 
of the two closed half-spaces of the hyperplane [23| . In- 
deed, if there exist two parallel supporting hyperplanes 
Hi and H2 at t\ € S and ti G S respectively, one can 
construct an affinc functional / on V such that f(x) = 1 
on i £ Hi and f(y) = for y G Hz- Then, the restriction 
of / to S is an effect which distinguishes ti and <2 with 
certainty since S is contained between Hi and H2 and 
f(ti) = = 0. Then, to find a Helstrom family 

of ensembles given in Theorem Q] is nothing but a simple 
geometrical task. Here, we explain this in the uniform 
distribution cases: From the definition of a (weak) Hel- 
strom family of ensembles and Theorem [TJ two ensembles 
{pi, sf, 1 — pi, ti} (i = 1, 2) for a distinct stats s\, S2 G S 




FIG. 3: [A] A typical Helstrom family of ensembles in R 2 ; [B] 
a Helstrom family of ensembles is not unique; [C] A Helstrom 
family of ensembles exists for models S with infinite numbers 
of extreme points. 

with the uniform distribution p\ = P2 = 1/2 are ensem- 
bles of a Helstrom family if t\, are distinguishable and 

S := qsi + (1 - q)h = qs 2 + (1 - q)t 2 , (12) 

with some < q := pi = p2 < 1. From ()12|) . si — S2 and 
ti — t2 should be parallel, and therefore one easy way 
to find Helstrom family is as follows: search conjugate 
states ti and t2 on the boundary of S which are on a line 
parallel to si — S2 such that there exist parallel supporting 
hyperplanes at ti and t2- Then, the crossing point is a 
reference state s while the ratio between s\ — s (s2 — s) 
and s—ti (s — t 2 ) determines the Helstrom ratio p = 
In Fig. [31 Helstrom families for some models on V = M 2 
are illustrated. 

Now it is important to ask whether a Helstrom family 
of ensembles always exists for any general probabilistic 
theories or not. In this paper, we show a Helstrom fam- 
ily of ensembles for a binary state discrimination (N = 2) 
always exist in generic cases for both classical and quan- 
tum systems. (For the existence in more general general 
probabilistic theories, see our forthcoming paper [13].) 
Here, we mean by generic cases all the cases except for 
trivial cases where Ps = max[pi,p2] with a trivial mea- 
surement u, i.e., there are no improvement of guessing 
which exceeds the prior knowledge. 

First, let us consider a quantum system Squ- For dis- 
tinct density operators pi , P2 with a prior probability 
distribution pi,p2, define an Hcrmitian operator X := 
P1P1 — P2P2- Let X = ^2iXiPi be the spectral decom- 
position of X. The positive and negative parts of X are 
given by X+ := Y r r ,,.,-,/', and AT_ := V, : , . \xi\Pi 
satisfying 

X = X+-X_. (13) 

Note that A+,AT_ > 0, X+X_ = 0, and ||X+||i - 
||X_||i = trX + — trA"_ = pi — p2- X + or AT_ might 
be zero operator [3l| . but in that case the optimization 
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problem is nothing but a trivial case. Indeed, suppose 
that X_ = 0. Then, for any POVM element E, it fol- 
lows trEX = trEX + < HIX+ = trX = p t - p 2 , and 
thus Ps = pi with a trivial measurement I from (J3J . The 
similar argument shows that the case X + = is again a 
trivial case with Ps = p 2 - Therefore, for any generic case, 
we can assume X + ,X- =/= 0, and this makes possible to 
define two density operators by 

" S P1' L '° !B P5 AV (14) 

Notice that they are orthogonal and thus are distinguish- 
able with certainty. It follows that supg <£;<][ tr EX = 
trX-j- = H-X+lli where the maximum is established by 
the projection operator P = J2i- x >o Fr° m ©' WG 
have 

PP =P2 + \\X + \\ 1 =p 1 + \\X_\\ 1 . (15) 

Let pi = p l /P s Q) (i = 1,2). It follows < p t < 1 from 
(fTS")) and |^ — H by definition. Finally, direct calculation 
using (fT3"|) , |TJ]) and (fT5"j) shows the equation ©. 
Therefore, we have obtained (32| 

Theorem 2 for any quantum mechanical systems, a 
Helstrom family of ensembles for a binary state discrim- 
ination exists for any generic cases. 

As any classical systems is embeddable into quantum sys- 
tems (into diagonal elements with a fixed base), we have 
also 

Theorem 3 For any classical systems S c \, & Helstrom 
family of ensembles for a binary state discrimination ex- 
ists for any generic cases. 

More concretely, for given distinct classical states s\ = 

{%i)i=l, s 2 = (Vi)f =1 € Sc\ {Xi,Vi > 0,J2i x i =J2iVi = 

1) with a prior probability distribution pi,p 2 , one can 
define t\ = ^ ^ (— minlY^, 0])f =1 , t 2 = ^ rr (maxLYj, 

0])f =1 where X, = Pl x t - p 2 Vi, \\X-\\i = T,i : x i <o X i 
and \\X + \\i = J2i-x >o X i- Finally Pi is given by p l /P s 
= 2 Pl /(l + 

In reference [18j , a family of ensembles in Theorem [T] 
(and thus a Helstrom family of ensembles) has been used 
in 2-level quantum systems for a binary state discrimi- 
nation with a uniform prior distribution po = p\ = 1/2. 
The purpose there was to reproduce Helstrom bound ((4]) 
in 2-lcvel quantum systems (with po — p\ — 1/2) by 
resorting to (A) remote state preparation and (B) no- 
signaling condition [33| . Compared to their results, The- 
orem [2] shows that a Helstrom family of ensembles exists 
not only in 2-level systems with uniform distributions but 
also in any quantum systems for generic cases. Moreover, 
Theorem [T] implies that a logical connection with an opti- 
mal state discrimination has already appears through the 
existence of a Helstrom family of ensembles, resort to nei- 
ther (A) nor (B); and indeed this appears in any general 
probabilistic theories, not only in quantum systems. Of 



course, this is consistent with the results in [18| and our 
result can be interpreted as a generalization of the results 
in [l8j to any dimensional quantum mechanical systems 
for any iV states discrimination. 

IV. EXAMPLES 

In this section, we illustrate our method in quantum 
2-level systems (qubit), and also in a simple toy model 
which is neither classical nor quantum. 

A. Quantum 2-level systems 

As is well known, any density operator p for qubit 
is represented by the Bloch vector b € D 3 := {b E 
K 3 I H&ll < 1} through the map b h-> p(b) = 
i(l + J^ 3 =1 OjCTj), where (i — 1,2,3) are Pauli Matri- 
ces. Notice that the trace distance between density op- 
erators coincides with the Euclid distance in M 3 between 
the corresponding Bloch vectors: ||p(bi) — p(&2)||i = 
||6i-6a||. 

[Examples 3: Binary state discrimination] Let us con- 
sider a state discrimination between p(b\) and p{b 2 ) with 
a uniform distribution. Following a geometrical view of a 
Helstrom family of ensembles in Theorem [TJ one can find 
it in the following manner: In order that states C\ 6 D 
and c 2 S D 3 have parallel supporting hyperplanes so that 
they are distinguishable, they should be on the Bloch 
sphere (pure states) in opposite direction [34| . Moreover, 
the line c\ — c 2 has to be parallel to £>i — b 2 from condition 
(|12[) . Then, c± and c 2 are uniquely determined by points 
on the intersection of the Bloch ball and the hyperplanc 
determined by b\ — b 2 and the origin (See Fig. [4]). Then, 
it is an elementary geometric problem to obtain the ra- 
tio: q = „ , I,., 2 . I, . Since the Helstrom ratio is given 

* 2+||bi — 02|| 

by P = Pi I Pi = l/2g, this reproduces a Helstrom bound 
P s = |(1 + - b 2 ||) by use of Theorem [Q Indeed, 
from |4]) , the optimal success probability to discriminate 
two distinct p\ and p 2 with a uniform prior distribution 
is 

4 Q) ^(l + ^IPi-P 2 ||i). (16) 
(Recall that \\p(b x ) - p(6 2 )||i = ||6i - &a||). 

[Examples 4: A^-numbers of symmetric state discrim- 
ination] In quantum systems, discrimination of N num- 
bers of state is much more difficult problem than bi- 
nary cases. For symmetric quantum (pure) states {pj = 
\ipj)(ipj\}jLi with uniform distribution pi = 1/N, where 
state vectors are given by = V J with a nor- 
malized vector \ip) and a unitary operator satisfying 
V N = exp(ix)l (x e R), Ban et al. [H| obtained the 
optimal success probability: 

pp = mm\ 2 , 
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parallel supporting liypcrplancs 



FIG. 4: 2-dimensional section of the Bloch Ball. 



where $ := J2f=i iV'iKV'jl- As a typical example, let 
us consider N symmetric states in 2-level systems (as 
illustrated in Fig. [5] [A] for the case N = 8). Let 
V := exp(— ijf(T3) be a unitary operator which rotates 
Bloch vectors by the angle 2-k/N around on the z axis 
(V N = -I); and let \i/>) := cos(f )|0) + sin(|)|l) where 
|0), |1) the eigenvectors of 03 with eigenvalues 1, — 1. The 
corresponding Bloch vector to is b = (sin 8, 0, cos 8). 
Then, it follows 




[B] 






1 • >;,•"" \ 
e / \ 






FIG. 5: [A] Symmetric quantum states pi, . . . ,ps in the Bloch 
ball; [B] 2-dimensional section where points A, B, C are pj, 
aj and p, respectively; (AC = AB = 2sin(# + £), and 



cos(-)|0)+sin(-)e ll ^|l), (17) 



thus q — 1 



AC 



= 1 



.); [C] Helstrom family of 



AB 2sin(9+£)cos£- 

ensembles with conjugate states Oj and the reference state p. 



for j 
60) = 



= 1,. 

(sin ( 



, AT, with the corresponding Bloch vectors 

cos 8). It is easy 



to show $ = ^(1 
probability is 



N 



N 



cos 8(73) [35|, and the optimal success 



P, 



(Q) 



(18) 



In the following, we apply our method and show that 
there exists a Helstrom family of ensemble for this prob- 
lem with any N and thus reproduce the success prob- 
ability (fT8"|) . (In the following, we identity the density 
operator pj , the state vector ipj , and its Bloch vector bj . 

First, from the symmetry and geometrical view point 
of a weak Helstrom family of ensembles, it is clear that 
a weak Helstrom family for {pj = \*l-'j)(ipj\}jLi and p, = 
1/N can be constructed as follows: In the Bloch ball, 
make lines from each pj to a point on the z-axis, say 
point C, and extend the lines until they arrives at the 
Bloch sphere, and let conjugate states aj be each end- 
points of the lines from pj. Fig. O [B] shows one of the 
2-dimcnsional sections of the Bloch ball where the points 
A and B are the corresponding pj and cry. Then, we have 
obtained a weak Helstrom family of ensembles pj] 1 — 
<7£ , aj } where is a ratio == , where we explicitly write 
the dependence on the angle £ = ZDAB, so that the 
reference state p is the point C: 

P = QtPj + (1 - Se)o-j (j = 1, ■ ■ ■ , AO- 



Note that we have a bound Pg < p = j^- from Proposi- 
tion [T] Therefore, in order to obtain a tighter bound of 
Ps, we would like to find a weak Helstrom family with 
larger as much as possible. It is again a simple ge- 
ometric problem to obtain q^ = 1 — sin( . e ^'^ +sin g (sec 
the caption of Fig. [5] [B]), which takes the maximum 

itu = irke at Zm = I - I (= ZDAE ) ( See Fi s- El 

[C]). This attains the tight bound (fT8")) . and thus we have 
demonstrated that our method reproduces the optimal 
success probability. Indeed, we can show that this weak 
Helstrom family of ensembles is a Helstrom family from 
Proposition O note that Oj = \<j>j){<t>j \ at £m is 

10,) =cos(J)|0)+sin(J)e l ( 2 ^ + ^|l). (19) 



Let \xj) := cos(f )|0) + sin(f )e 
thogonal to \cj>j) for all j = 1, . . 



2ir(j-l) 



1 N. 



|1) which is or- 
Then, it follows 

E*LilXi)<Xi| = f I and thus i E i : = 7f\Xi)(Xi\} is a 
POVM which satisfies the condition tr E^i =0 in Propo- 
sition [21 Consequently, we have found a Helstrom family 
of ensembles and thus obtained the Helstrom bound. 



B. Probabilistic Model with square-state space 

As an example which is not neither classical nor quan- 
tum systems, let us consider a general probabilistic model 
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,(«) ,(01) 



0=) 



-Q- 



parallel 
supporting 
hyeperplAnes 



^ ^ 

parallel supporl ing liyperplanes 

FIG. 6: Probabilistic model with square-state space. 



with square-state space Ssq '■= {(xi, £2) G M 2 | < Xi < 
l(i = 1,2)} (Fig. [5]). This can be considered as a simplest 
nontrivial model which is neither classical nor quantum 
systems. It should be noticed that this is not just a toy 
model and one can show that this probabilistic model 
can be physically realized from a classical system under 
a certain restriction of measurements [Tl[ [2f| . It is ob- 
vious that Ssq is a compact convex subset in R 2 with 4 
numbers of pure states: 



3 ( 00 > = (0,0),s( 01 ) = (0,l), S ( 10 > = (l,0), 



,(n) - 



= (1,1) 



(20) 

[Example 5: Binary state discrimination] Let us con- 
sider a binary state discrimination problem for two dis- 
tinct states S\ = (£1,1/1), Si = (#2,2/2) ^ Ssq with uni- 
form distribution. Without loss of generality, let x\ < x%. 
There are two cases; (a) Z(s2 — Si, s' 10 ) — s' 00 )) < 7r/4 
or (b) Z(s 2 - si, s( 10 ) - s(°°)) > tt/4, where Z(a, b) := 



arccos(- 



ab 



= ) is the angle between two vectors a and 



/a-avb-b 

b. In case (a), clearly there exist conjugate states t\ 
and t 2 on line s' 11 ' — s' 10 ) and line s^ 01 ' — s^ 00 ), respec- 
tively such that t\ — t 2 are parallel to s± — s 2 - Since 
there exists parallel supporting hypcrplanes on t\ and t 2 
(see Fig. [6] (a)), we have a Hclstrom family from The- 
orem [TJ Then, it is an elementary calculation to find 



9 = _^ 1 ; an d hence the optimal success probability 

1 1 I 3^2 1 1 

is P5 = 2 (1 + |#2 — 3-1 1) j Similarly in case (b), we have 
a Hclstrom family and the optimal success probability is 
given by P s = ±(1 + \y 2 - Vl \) (see Fig. ©(b)). 

[Example 5: state discrimination of pure states] Since 
Ssq is not a simplex, and thus not a classical system, 
four pure states (|20[) cannot be discriminated in a single 
measurement. Let us obtain the optimal success prob- 
ability to distinguish all the pure states with uniform 



distribution. From a geometrical consideration, one has 
to find as large polygon as possible in Ssq which is re- 
verse homothetic to conv JJ= o,i [s^] = S sq - Clearly, it 
is Ssq itself, with the similarity point at the center of 
Ssq- More precisely, one can choose conjugate states 
= a (»ex,j'ei) wne re © denotes the exclusive OR, 
and q = 1/2. Therefore, we obtained a weak Helstrom 
family with the Helstrom ratio p = ^- = 1/2. It turns 
out that this weak Helstrom family is a Helstrom fam- 
ily, and thus we obtain Ps = 1/2 to discriminate all 
pure states in this system. Indeed, it is easy to see 
that affine functionals e^' = 0,1) on Ssq defined 
by eW(t ij ) = 0, e W)(t iffil '-' ffil ) = 1/2 (and hence satis- 



fying e (^)(i*©iJ) = e W)(tij®i) = 1/4) for any i,j = 0, 1 
forms a 4-valued observable {e^} on Ssq- This satisfies 
the sufficient condition in Proposition [5] 



V. CONCLUSION 

In this paper, we introduced a notion of a (weak) Hcl- 
strom family of ensembles in general probabilistic the- 
ories and showed the close relation with state discrim- 
ination problems. Basically, Hclstrom family can be 
searched by means of geometry, and once we have the 
family, or at least a nontrivial weak family, the optimal 
success probability, or a bound of it, is automatically ob- 
tained from the Hclstrom ratio. In binary state discrim- 
inations, a weak Hclstrom family of ensembles with dis- 
tinguishable conjugate states is shown to be a Helstrom 
family which has again a simple geometrical interpreta- 
tion. Wc illustrated our method in 2-level quantum sys- 
tems and reproduced the Helstrom bound (|16p for binary 
state discrimination and symmetric quantum states (| 18|) . 
As an nontrivial general probabilistic theories, a proba- 
bilistic model with square-state space is investigated and 
binary state discrimination and pure states discrimina- 
tion are established using our method. In this paper, 
we showed the existences of Helstrom families of ensem- 
bles analytically in both classical and quantum theory in 
any generic cases in binary state discriminations. For the 
more general models, it will be investigated in our forth- 
coming paper [22|. There, we also clarify the relation 
between our method and linear programming problem. 
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with finitely many outputs, since it is enough to consider 
for our purpose to discriminate N numbers of states. Note 
that it is straightforward to formalize general observables 
with measure theoretic language. 

A subset C in a real vector space V is called convex if 

As + (1 - \)t £ C for any s,t G C and A G [0, 1]. 

If s G C does not have a nontrivial convex combination 

in C, i.e., if s = Xt + (1 — X)u for some t,u G C and 

A € (0, 1) implies s = t — u, then s is called an extreme 

point. 

A convex polytope C = conv^i^.^jvjci} = 

•EiliP* c i Pi > 0>I]iPi = 1} C V with N num- 
bers of extreme points a G V is called a simplex if any 
element c £ C has the unique convex combinations with 
respect to ds. Equivalently, C is called a simplex iff the 
affinc dimension of C is N — 1. 

Note that the observable here will be the so-called POVM 
(positive-operator- valued measure), and therefore what 
is usually called observable in the standard textbook of 
quantum mechanics which is characterized by an Hermi- 
tian operator is a special observable in this paper. 
Note that this happens even when <p\,pi < 1. 
Although we explained in finite level quantum systems, it 
is straightforward to generalize to any quantum mechan- 
ical systems with Hilbert space with countably infinite 
dimension. (Notice that X is a trace class operator on H 
and thus has a discrete spectral decomposition.) 
[(A) Remote State Preparation] Let {pr, Pi} and 
{qj,o~j} be two ensembles in a quantum system (pi,(Tj G 
<Squ, Pi,Qj > ^,YZ % Pr,Y J j1i = 1) which satisfies 
^2iPiPj ~ 1j a i- Then, there exist a Hilbert space K,, 
a state r on j-C ®1C, and local measurements M\ and M2 
on K, such that the ensembles {pi \ p;} and {qj , o~j} can be 
remotely prepared by measuring M\ and M2 under the 
state r. [(B) No-signaling condition] Any information 
does not instantaneously transmit by local measurement. 
In quantum systems, it is well known that both (A) and 
(B) holds. 

Indeed, this is equivalent to the orthogonality between 
p(ci) and p{c 2 ). 

For instance, one can make use of the Bloch 
vector representation. $ := £,=i V'jKV'j — 
^ =1 b ( k j) (T k ) = f (1 + cos flag ), since 
^2 &i fi vanish from the rotational 



symmetry around z-axis. 



